Write Data
When working with Fused we recommend you save your files in 2 formats: Parquet & Cloud Optimized GeoTIFF (COG).
Read more about why we recommend those formats.
Table: to parquet
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/housing/housing_2024.csv"):
import pandas as pd
housing = pd.read_csv(path)
housing['price_per_area'] = round(housing['price'] / housing['area'], 2)
processd_data = housing[['price', 'price_per_area']]
# Saving to user specific location
username = fused.api.whoami()['handle']
output_path = f"s3://fused-users/fused/{username}/housing_2024_processed.parquet"
processd_data.to_parquet(output_path)
return f"File saved to {output_path}"
Array: to Cloud Optimized GeoTIFF (COG)
@fused.udf
def udf(path: str = "s3://fused-sample/demo_data/satellite_imagery/wildfires.tiff"):
import rasterio
import numpy as np
# Read the raster data
with rasterio.open(path) as src:
data = src.read()
profile = src.profile
# Process the data
processed_data = np.where(data > np.percentile(data, 80), 255, 0).astype(np.uint8)
# Update profile for writing
profile.update({
'driver': 'GTiff',
'compress': 'lzw',
'dtype': 'uint8'
})
# Write to Fused's shared disk (accessible to all UDFs in org)
username = fused.api.whoami()['handle']
output_path = f"/mnt/cache/wildfires_processed_{username}.tif"
with rasterio.open(output_path, 'w', **profile) as dst:
dst.write(processed_data)
return f"File saved to shared disk at {output_path}"
Geo-partitioning large datasets: fused.ingest()
Large geospatial data might not be optimally formatted or partitioned. Fused offers a simple way to ingest your data at scale.
# Get your user handle
user = fused.api.whoami()['handle']
# Ingest Washington DC Census data
job = fused.ingest(
input="https://www2.census.gov/geo/tiger/TIGER_RD18/LAYER/TRACT/tl_rd22_11_tract.zip",
output=f"fd://{user}/data/census/partitioned/", # Saving to your Fused bucket
)
job.run_remote()
You can tail logs to see how the job is progressing:
fused.api.job_tail_logs("your-job-id")
Learn more about Fused data ingestion